Towards Performance-Portable, Scalable, and Convenient Linear Algebra
نویسندگان
چکیده
The rise of multiand many-core architectures also gave birth to a plethora of new parallel programming models. Among these, the open industry standard OpenCL addresses this heterogeneity of programming environments by providing a unified programming framework. The price to pay, however, is that OpenCL requires additional low-level boilerplate code, when compared to vendor-specific solutions, even if only simple operations are to be performed. Also, the unified programming framework does not automatically provide any guarantees on performance portability of a particular implementation. Thus, device-specific compute kernels are still required for obtaining good performance across different hardware architectures. We address both, the issue of programmability and portable performance, in this work: On the one hand, a high-level programming interface for linear algebra routines allows for the convenient specification of the operations of interest without having to go into the details of the underlying hardware. On the other hand, we discuss the underlying generator for device-specific OpenCL kernels at runtime, which is supplemented by an auto-tuning framework for portable performance as well as with work partitioning and task scheduling for multiple devices. Our benchmark results show portable performance across hardware from major vendors. In all cases, at least 75 percent of the respective vendortuned library was obtained, while in some cases we even outperformed the reference. We further demonstrate the convenient and efficient use of our high-level interface in a multi-device setting with good scalability.
منابع مشابه
A Scalable Linear Algebra Library for Distributed Memory Concurrent Computers
pack: A portable linear algebra library for high-performance computers. [4] C. C. Ashcraft. The distributed solution of linear systems using the torus wrap data mapping. Prospectus for the development of a linear algebra library for high performance computers. [14] J.J. Dongarra and R.A. van de Geijn. Reduction to condensed form for the eigenvalue problem on distributed memory architectures. LA...
متن کاملThe Design of Scalable Software Libraries for Distributed Memory Concurrent Computers
This paper describes the design of ScaLAPACK, a scalable software library for performing dense and banded linear algebra computations on distribuled memory concurrent computers. The specification of the data distribution has important consequences for interprocessor communication and load balance, and hence is a major factor in determining performance arid scalability of the library routines. T...
متن کاملLAPACK Working Note 43 A Look at Scalable Dense Linear Algebra Libraries ∗
We discuss the essential design features of a library of scalable software for performing dense linear algebra computations on distributed memory concurrent computers. The square block scattered decomposition is proposed as a flexible and general-purpose way of decomposing most, if not all, dense matrix problems. An object-oriented interface to the library permits more portable applications to ...
متن کاملOpenCL Evaluation for Numerical Linear Algebra Library Development
With the help of of CUDA [7], [6], many applications improved their performance by using GPUs. In our project called Matrix Algebra on GPU and Multicore Architectures (MAGMA) [10], we mainly focus on dense linear algebra routines similar to those from LAPACK [1]. Other than CUDA, there exist other frameworks that allow platformindependent programming for GPUs. The main three frameworks are: 1) ...
متن کاملA Reliable Generation of High-Performance Matrix Algebra
Scientific programmers often turn to vendor-tuned Basic Linear Algebra Subprograms (BLAS) to obtain portable high performance. However, many numerical algorithms require several BLAS calls in sequence, and those successive calls result in suboptimal performance. The entire sequence needs to be optimized in concert. Instead of vendor-tuned BLAS, a programmer could start with source code in Fortr...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2013